Performance Analysis of User-Level PIM Communication in the Data IntensiVe Architecture (DIVA) System

نویسندگان

  • Sumit D. Mediratta
  • Jeffrey T. Draper
چکیده

The performance of user-level messaging in PIM (Processing-InMemory) to PIM communication is modeled and analyzed for the DIVA (Data IntensiVe Architecture) system. Six benchmarks have been used for this purpose, two from each category, namely single message transfer, parallel transfer and collective communication, as described for the PMB (Pallas MPI Benchmarks). The benchmarks used are PingPong, PingPing, SendReceive, Exchange, Barrier synchronization and AllToAll personalized exchange. The main significance of this work lies in the evaluation of an implementation of system-wide support for memory-to-memory and memory-to-host communication via a parcel buffer (used as a network interface). Another remarkable feature of this evaluation lies in presenting an optimal algorithm for Barrier synchronization and an optimal algorithm, with full channel utilization, for AllToAll personalized exchange for the bi-directional ring configuration of up to 8 DIVA PIMs in the memory system of a Hewlett-Packard’s zx6000 server. The algorithms presented can be scaled for higher number of PIM chips with a little degradation in performance over the optimal solution. Our analysis shows that the currently employed communication mechanism can be used very efficiently for collective communication operations, and it also exposes the bottlenecks in the current design for future improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

[IBMMot94] IBM Microelectronics and Motorola Inc., “The PowerPC Microprocessor Family: The Programming Environments,” IBM Microelectronics Document MPRPPCFPE-01, Motorola Document MPCFPE/AD (9/94). [IBM99] IBM Microelectronics, “IBM chip advances spur “sy

The DIVA (Data IntensiVe Architecture) system incorporates Processing-In-Memory (PIM) chips as smart-memory coprocessors to a microprocessor, with a separate memory-to-memory interconnect, thus exploiting the inherent memory bandwidth both on chip and across the system. DIVA targets several classes of bandwidth-limited applications, including multimedia applications and pointer-based and sparse...

متن کامل

A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smartmemory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project has built a prototype developme...

متن کامل

8.0 Conclusions and Future Work

Processing-in-memory (PIM) chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memory-bandwidth requirements. The Data-IntensiVe Architecture (DIVA) system combines PIM memories with one or more external host processors and a PIM-to-PIM interconnect. DIVA increase...

متن کامل

The Address Translation Unit of the Data-Intensive Architecture (DIVA) System

The Data-Intensive Architecture (DIVA) system incorporates Processing-In-Memory (PIM) chips as smart-memory coprocessors to a microprocessor. This architecture exploits inherent memory bandwidth both on chip and across the system. Thus, performances of pointer-based and sparse-matrix computations as well as multimedia applications are significantly enhanced. A key feature of the DIVA architectu...

متن کامل

Evaluation of Architectural Paradigms for Addressing the Processor-Memory Gap

Many high performance applications run well below the peak arithmetic performance of the underlying machine, with inefficiencies often attributed to poor memory system behavior. In the context of scientific computing we examine three emerging processors designed to address the wellknown gap between processor and memory performance through the exploitation of data parallelism. The VIRAM architec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005